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ABSTRACT 


Andrews  et  ai  (1972)  carried  out  an  extensive  Monte  Carlo 
study  of  robust  estimators  of  location.  Their  conclusions  were 
that  the  hampel  and  the  skipped  estimates,  as  classes,  seemed  to 
be  preferable  to  some  of  the  other  currently  fashionable  estima- 
tors. The  present  study  extends  this  work  to  include  estimators 
not  previously  examined.  The  estimators  are  compared  over  short- 
tailed as  well  as  long- tailed  alternatives  and  also  over  some 
dependent  data  generated  by  first-order  autoregressive  schemes. 

The  conclusions  of  the  present  study  are  threefold.  First,  from 
our  limited  study,  none  of  the  so-called  robust  estimators  are  very 
robust  over  short-tailed  situations.  More  work  seems  to  be  neces- 
sary in  this  situation.  Second,  none  of  the  estimators  perform 
very  well  in  dependent  data  situations,  particularly  when  the 
correlation  is  large  and  positive.  This  seems  to  be  a rather 
pressing  problem.  Finally,  for  long-tailed  alternatives,  the  ham- 
pel  estimators  and  Hogg-type  adaptive  versions  of  the  hampels  arc 
the  strongest  classes.  The  adaptive  hampels  neither  uniformly  out- 
perform nor  are  they  outperformed  by  the  hampels.  However,  the 
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superiority  in  terms  of  maximum  relative  efficiency  goes  to  the 
adaptive  hampels.  That  is  the  adaptive  hampels,  under  their  worst 
performance,  are  superior  to  the  usual  hampels,  under  their  worst 
performance. 


1 . INTRODUCTION 

Andrews  et  al  (1972)  have  published  a very  extensive  and  in- 
formative Monte-Carlo  study  of  robust  estimation  of  location  in  a 
symmetric  probability  density.  This  study  involved  some  68  estima- 
tors of  location  as  well  as  14  distinct  sampling  distributions. 
Sample  sizes  used  were  5,  10,  20  and  40  although  for  most  of  the 
sampling  situations  only  the  sample  size  of  20  was  investigated. 

For  the  estimators  and  sampling  situations  examined,  they  provide 
a very7  complete  and  satisfactory  picture. 

We  are  motivated,  however,  to  supplement  this  study  for  three 
basic  reasons.  In  Andrews  et  al,  (Princeton  Robustness  Study) 
(1972,  p.  67),  hereafter  referred  to  as  PRS,  short-tailed  sampling 
situations  are  ruled  out  of  consideration  with  the  statement, 
"Robustness  for  short-tailed  distributions  was  thought  to  be  a 
rather  special  case,  arising  in  situations  that  are  usually  rather 
easily  recognized  in  practice."  It  is  our  contention  that  short- 
tailed data  do  arise  in  practice.  For  example.  Professor  R.  V. 
Hogg  points  out  that  studies  at  the  Iowa  Testing  Service,  home  of 
the  ACT  college  entrance  examination,  clearly  show  that  scores  on 
these  examinations  tend  to  arise  from  short- tailed  distributions. 
Our  personal  experience  with  instrumentation  data  from  U.  S.  Naval 
projects  also  convinces  us  that  data  passing  through  one  or  more 
sets  of  electronic  instruments  often  tend  to  be  characterized  by  a 
short-tailed  distribution.  Moreover,  since  some  of  the  location 
estimators  were  designed  to  protect  against  short-tailed  alterna- 
tives as  well  as  long-tailed  possibilities,  we  believe  that  exami- 
nation of  long-tailed  alternatives  alone  faults  such  estimators  un- 
fairly, rather  akin  to  discovering  a two-sided  test  is  not  most 
powerful  against  one-sided  alternatives.  Even  ignoring  this 
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aspect,  the  question  remains,  "What  are  desirable  estimators  given 
short-tailed  alternatives?" 

A second  motivation  was  provided  by  the  rather  dismal  situa- 
tion in  the  time  series  context.  The  sample  mean  is  routinely 
shown  to  be  a consistent  estimator  of  the  "center"  of  a stationary 
time  series  with  the  clear  implication  that  a time  series  is  cen- 
tered by  subtracting  the  sample  mean.  One  might  suggest  that  one 
first  remove  the  autocorrelation  by  fitting  some  sort  of  autore- 
gressive (AR)  or  moving  average  (MA)  scheme  and  then  use  X to 
estimate  the  center  of  the  errors  which  in  the  stationary  case  is 
the  same  as  the  center  of  the  time  series.  Unfortunately  to  fit  an 
AR  or  a MA  scheme  requires  that  we  have  a zero  mean  time  series, 
i.e.  a centered  time  series.  Thus  a robust  estimate  of  location  in 
a time  series  context  is  a matter  of  practical  concern.  If  the 
sample  mean  is  unsatisfactory  for  independent,  but  non-normal  data, 
how  much  worse  must  it  be  for  correlated  data? 

Our  third  motivation  arises  from  a personal  conviction  that 
adaptive  estimators,  properly  formulated,  ought  to  be  very  success- 
ful. In  particular,  we  observe  that  because  of  the  contributions 
of  Professors  Hampel  and  Huber  to  the  PRS,  the  so-called  hampel  and 
closely  related  M-estimators  were  extensively  studied.  At  least  25 
of  the  68  estimators  studied  involve  either  a hampel  or  a M-estima- 
tor.  The  M-estimators  and  particularly  the  hampels  perform  very 
well  both  because  they  are  good  estimators  and  because  they  were 
fine-tuned.  The  adaptive  estimators,  however,  were  not  similarly 
fine-tuned  and  do  not  fare  as  well  in  the  final  analysis.  To  state 
our  conviction  succinctly,  if  hampels  are  good,  adaptive  hampels 
should  be  better. 

The  paper  is  divided  into  five  parts.  Section  2 lists  the 
estimators  studied  in  this  paper  and  presents  a short  discussion. 
Section  3 discusses  the  details  of  the  Monte-Carlo  aspects  of  this 
work  while  section  4 contains  the  tables  of  results.  Finally,  sec- 
tion 5 contains  our  reactions  to  and  conclusions  about  the  results. 
We  note  here  that  this  paper  is  not  intended  to  compete  in  scope  or 
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in  detail  with  the  PRS.  The  latter  was  a long  and  arduous  study. 

We  believe  because  of  this,  the  PRS  has  and  will  formulate  direc- 
tions of  research  for  some  time,  unfortunately  in  our  view,  away 
from  adaptive  estimators.  Our  intention  is,  therefore,  to  present 
evidence  that  adaptive  estimators  are  at  least  competitive.  Our 
adaptive  procedures  are  based  on  experimentation  and  intuition  and 
should  be  taken  as  first  generation  refinements.  We  are  quite  sure 
adaptive  estimators  can  be  further  improved. 

2.  THE  ESTIMATORS 

In  choosing  which  estimators  to  examine  in  this  study,  we  were 
guided  by  the  results  of  the  PRS.  As  standards  of  comparison,  we 
chose  M-estimators  and  the  related  hampels  and  the  trimmed  means. 
Against  these  standards,  we  measured  various  forms  of  adaptive 
estimators  and  several  miscellaneous  estimators  — including  the 
Hodges -Lehmann,  Normal  Scores  Rank  Estimator,  Johns',  and  an  esti- 
mator based  on  skipping.  We  follow  the  routine  established  in  the 
PRS  by  listing  the  estimators  together  with  the  mnemonic  codes  in 
table  I.  We  note  here  that  the  codes  described  below  agree  with 
those  in  the  PRS  when  an  estimator  is  common  to  both  works. 


Trimmed  Means 

A simple  scheme  for  robustifying  the  mean  is  to  eliminate 
"extreme”  observations.  The  a(100)%  symmetric  trimmed  mean  dis- 
cards the  [(N+l)a]  largest  and  [(N-t-l)a]  smallest  observations 
and  computes  the  mean  of  the  remaining  observations.  Here,  [•] 
is  the  greatest  integer  function.  The  outer-mean,  OM,  is  sometimes 
used  in  short- tailed  situations  and  is  computed  as  the  mean  of  the 
trimmings  with  a = .25. 


Huber's  M-Estimators  and  Hampels 

M-estimators  of  location  arc  solutions. 


X* 

j=l 


fX.-Tl 


= 0. 


T,  of  the  equation 


Hampel  (1974)  proposed  estimating  the  scale,  S,  with 


TABLE  I 


A Brief  Description  of  the  19  Estimators  of  Location  Together  With 

a Mnemonic  Code  for  Each 


Number  Code 


Short  Description 


1 

2 

3 

4 

5 

6 

7 

8 
9 

10 

11 

12 

15 

14 

15 


105  105  symmetrically  trimmed  mean 

505  505  symmetrically  trimmed  mean  (median) 

M Mean 

OM  Outer  mean,  mean  of  trimming  after  255  symmetrical 

trimming 

12A  One- step  hampel  M-estimate,  ip  bends  at  1.2,3. 5,8.0 

17A  One-step  hampel  M-estimate,  ip  bends  at  1.7, 3. 4, 8. 5 

21A  One-step  hampel  M-estimate,  ip  bends  at  2. 1,4. 0,8. 2 

25A  One-step  hampel  M-estimate,  ^ bends  at  2. 5, 4, 5, 9. 5 

ADA  Adaptive  hampel  M-estimate,  ip  bends  at  ADA, 4. 5,8.0 

HG1  Hogg-type  adaptor  using  trimmed  means  3S5,195,M,OM 

HG2  Hogg-type  adaptor  using  trimmed  means  3S5,255,105 
1.81*A  Hogg -type  adaptor  using  hampels  25A,21A,12A 
1.90A  Hogg-type  adaptor  using  hampels  25A,ADA,17A 
1.95A  Hogg-type  adaptor  using  hampels  25A,ADA,17A 
2.00A  Hogg- type  adaptor  using  hampels  21A,12A 


16  H/L  Hodges -Lehmann  estimator 

17  RN  Normal  scores  rank  estimator 

18  JOH  Johns'  adaptive  estimator 

19  5T4  Multiply- skipped  mean,  max(5k,2)<.6N  deleted 


med |x^- 505 |/. 6745.  The  hampel  estimators  are  M-estimators  with 


given  by 

>1 

0<|x|<a 

a 

a<|x| <b 

Kx)  = sgn 

x • • 

c"|x| 

a 

b<|x|<c 

0 

|x|>c  . 

The  parameters 

a,  b and 

c are  given  below  together  with  the 

code  symbol. 

Code 

a 

b 

c 

12A 

1.2  3.5 

8.0 

17A 

1.7  3.4 

8.5 

21A 

2.1  4.0 

7.2 

25A 

2.5  4.5 

9.5 

k 
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ADA  is  a mildly  adaptive  form  of  the  hampel. 

The  hampels  examined  in  this  work  are  one-stop  estimators  (cf. 
Bickel  (1975)),  that  is  they  are  not  the  exact  root  of 


n 


I * 

i=l 


fXi-T) 


= o, 


but  the  result  of  one  iteration  of  the  Newton- Raphson  method  using 
50%  (the  median)  as  starting  value.  More  complete  details  of 
these  estimators  may  be  found  in  the  PRS. 

Hogg -Type  Adaptors 

Hogg  (1974)  has  suggested  an  adaptation  procedure  which 
chooses  among  two  or  more  estimates  of  the  center  depending  on  the 
value  of  some  statistic  chosen  to  measure  tail  length.  Hogg  origi- 
nally suggested  use  of  the  kurtosis,  while  more  recent  suggestions 
include 

Q-L  = (U(. 05) -L ( . 05) ) / (U(.  50)  -L(.  50) } 

Q2  = (U(.  20)  -L(.  20))  / (U(.  50)  -L(.  50))  , 


where  U(a)  (L(a))  is  the  mean  of  the  largest  (smallest)  [(N+l)a] 
observations.  The  exact  form  of  the  various  adaptors  is  given 
below.  Here  k%  denotes  a k%  symmetrically  trimmed  mean. 


Code  Formulation 


OO 

tn 

V 

•^>3.2 

HGl 

T = < 

1 19% 

2.6<QjS3. 2 

M 

2.0<Q1s2.6 

i 

OM 

Q1<2.0 

38% 

Q2>1.87 

HG2 

T = i 

25% 

1.81^Q2<1.87 

i 

10% 

Q2<1.81 

21A 

1.81<Q2<1.87 

1.81*A 

T = | 

25A 

Q2<;1.81 

12A 

Q2>1.87 

7 


y 


Code 


1.90A 


1.95A 


2.00A 


Formulation 


'ADA 

1.90<QZ<2.05 

■ 25A 

Q2<1.90 

17A 

Q2>1. 87 

'ADA 

1.95<Q2<2.10 

■ 25A 

Q2<1.95 

17A 

Q2>2. 10 

[21A 

Q2<2.00 

\12A 

Q2>2. 00 

Rank  Estimators 

In  what  follows,  let  R^(0)  denote  the  signed  ranks  of  x^-9. 
That  is,  we  rank  x1-9,x2-0, . . . ,^-0  according  to  magnitude  (but 
not  sign)  and  R^Q)  is  the  product  of  sgn(xi-0)  and  rank  of 
(x^-0) . Further  let  us  define  J(u)  = 4>  where  $ is  the 

normal  distribution  function.  The  rank  statistic  with  normal 
scores,  (RN),  is  taken  as  the  root,  T,  of  the  equation 


J*(T)-J  = 0 

where 

J*(T)  = n'1  l*  j(Ri(T)/n+l] 
and  n 

J = n * £ J(i/n+l) . 

i=l 

The  symbol,  , indicates  the  summation  over  all  positive  ranks. 
Roots  were  found  by  the  method  of  bisection  using  a maximum  of 
twelve  iterations.  If  $ is  taken  as  uniform,  the  resulting  rank 
statistic  is  asymptotically  equivalent  to  the  Hodges -Lehmann  esti- 
mator (cf.  Hodges -Lehmann  (1965).  Details  of  Johns  (1974)  and  the 
skipped  estimate  may  be  found  in  the  PRS. 

3.  MONTE-CARLO  DETAILS 

In  addition  to  the  long-tailed  sampling  situations  investi- 
gated in  the  PRS,  we  also  investigated  short-tailed  situations  and 
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situations  involving  dependent  data.  Pseudo-random  numbers  were 
generated  by  a multiplicative  congruential  generator.  Cycle  time 
was  constructed  to  be  considerably  larger  than  the  number  of  obser- 
vations needed  for  this  study.  We  also  checked  for  bias,  variance, 
and  serial  correlation  and  the  generator  was  suitable  in  these 
respects.  (See  the  PRS  for  description  and  cautions  in  the  use  of 
these  generators.)  The  output,  scaled  between  0 and  1,  form  the 
basis  for  all  subsequent  calculations  since  they  approximate  a 
uniformly  distributed  sample.  Normal  pseudo-random  deviates  were 
generated  according  to  the  well-known  Box-Muller  transform  (cf. 
Paley  and  Wiener  (1954),  p.  146)  and  Cauchy  pseudo-random  deviates 
were  generated  by  the  tangent  transform.  These  three  basic  distri- 
butions, or  mixtures  thereof,  form  the  basis  of  all  results 
reported  in  this  work. 

The  basic  sampling  situations  are  summarized  in  Table  II. 


TABLE  II 


Sampling  Situations  for  Independent  Observations  (n  = 20).  N/U  is  a 
Normal  (0,1)  Variate  Divided  by  an  Independent  Uniform  Variate 

Distributions  Code 


Uniform,  mean  0,  variance  1 (denoted  U(0,1)  ) 

97%  Uniform  (0,1)  + 3%  Normal  (0,1)  contamination 
90%  Uniform  (0,1)  + 10%  Normal  (0,1)  contamination 
50%  Uniform  (0,1)  + 50%  Normal  (0,1)  contamination 
Normal,  mean  0,  variance  1 


U 


U+.03N(0,1) 
U+. 10N (0,1) 
U+.50N(0,1) 
N(0,1)=N 


99% 

Normal 

(0,1) 

+ 

1% 

97.! 

5%  Normal  (0,1) 

+ 

95% 

Normal 

(0,1) 

+ 

5% 

90% 

Normal 

(0,1) 

+ 

10 

75% 

Normal 

(0,1) 

+ 

25 

95% 

Normal 

(0,1) 

+ 

5% 

90% 

Normal 

(0,1) 

+ 

10 

75% 

Normal 

(0,1) 

+ 

25 

97% 

Normal 

(0,1) 

■f 

3% 

90% 

Normal 

(0,1) 

+ 

10 

50% 

Normal 

(0,1) 

-f 

50 

Cauchy,  median  = 

0 

90% 

Normal 

(0,1) 

+ 

10 

75% 

Normal 

(0,1) 

+ 

25 

N+ 


2.51 


Normal  (0,9)  contamination  N+ 

N+ 
N+ 
N+ 


Normal  (6,9)  contamination 


Cauchy  contamination 
? Cauchy  contamination 


N+ 

N+ 

N+ 

N+ 

N+ 


,01N(0,9) 

. 025N(0,9) 
,05N(0,9) 

, 10N(0,9) 

, 25N(0,9) 

, 05N(0, 100) 
,10N(0,100) 
,25N(0,100) 
, 03C 
10C 


N/U 


N+. 

C 

N+. 

N+. 


50C 


10N/U 

25N/U 


f 


. 


In  addition  to  the  sets  of  independent  observations  discussed 
above,  we  generated  observations  according  to  a first  order  auto- 
regressive scheme,  Xj  = pXj_^  + Cj  , j = 1,2,..., 20.  The  shock, 

Ej  , was  chosen  according  to  either  N,  U or  C and  the  correla- 
tion coefficient,  p,  as  either  .2,  .5  or  .9.  The  initial 
value,  Xq  , was  chosen  as  0.  All  of  the  19  estimators  discussed 
in  section  2 were  investigated  in  the  19  independent  sampling  situ- 
ations. Only  the  non-adaptive  estimators  were  studied  in  the  9 
time -series  sampling  achemes. 

The  results  we  report  are  based  on  1000  Monte-Carlo  replica- 
tions. The  sample  variances  of  the  estimators  were  calculated  and 
then  scaled  by  a factor  of  20  (the  sample  size)  to  make  the  results 
comparable  to  those  in  the  PRS.  Assuming  approximate  normality  for 

the  estimators,  the  variance  of  20*  sample  variance  would  be 
4 2 

about  . 7992a~  where  aT  is  the  true  variance  of  the  estimator, 

1 — 2 1 

T.  For  example  if  T = X,  then  is  -jq  an<l  the  variance  of  20 

times  the  sample  variance  is  .001998.  Thus  one  may  expect  about 

one  decimal  place  of  accuracy  with  correspondingly  less  signifi- 

2 

cance  as  the  value  Oj,  climbs.  We  report  two  decimal  places,  in 
most  cases,  because  all  estimators  were  calculated  over  the  same 
samples.  Without  attempting  inferences  outside  these  samples,  the 
extra  decimal  place (s)  are  meaningful. 

4.  THE  RESULTS 

The  main  results  of  the  Monte  Carlo  study  are  summarized  in  2 
tables.  Table  III  is  a table  of  sample  variances  (over  the  1000 
Monte-Carlo  replications)  for  the  19  independent  sampling  schemes 
and  19  estimators  of  location.  The  symbol  ***  in  Table  III  refers 
to  a variance  exceeding  100.00.  Table  IV  is  a table  of  biases  and 
variances  for  the  9 sampling  schemes  involving  first  order  autore- 
gressive variates.  None  of  the  Hogg- type  adaptive  estimators  are 
listed  for  the  two-fold  reason  that  the  choice  of  and  is 

predicated  on  independent  observations  and  that  the  basic  estima- 
tors — trimmed  means,  hampels,  etc.  — perform  poorly.  Some 
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17A  1 

Z1A  1 

8 25A  1 

9 ADA  1 

10  HG1 

11  HG2  1 

12  1.81*A  1 

13  1.90A  1 

14  1.95A  1 

15  2.00A  1 

16  H/L  1 

17  RN 

18  J0H 

19  5T4  1 
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.89 
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1.22 
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1.82  1.55 
.94  1.05 
.84 


.46  1.22 
.26  1.11 
.14  1.04 
.28  1.11 
.88  .96 

.41  1.19 
.17  1.08 
.15  1.05 
.14  1.04 
.26  1.11 

.25  1.09 
.97  .94 
.93  .95 
.27  1.10 


1.32 
1.72 
1.82 

2.27  3.37 
1.36  1.46 

1.27  1.39 

1.24  1.38 

3 1.23  1.38 

1.28  1.43 

1.28  1.41 


1.31  1.37 
1.26  1.41 

1.25  1.40 
1.24  1.39 
1.24  1.39 

1.26  1.38 
.98  1.14  1.26  1.45 

6 1 1.02  1.20  1.25  1.47 
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1 

10% 

2.02 

1.22 

2.16 

9.12 

1.13 

1.22 

2.70 

8.34 

1.23 

1.80 

2 

50% 

2.08 

1.58 

1.94 

2.56 

1.55 

1.59 

2.06 

2.83 

1.61 

2.00 

3 

M 

2.96 

5.45 

11.5 

26.0 

72.9 

*** 

*** 

*** 

*** 

*** 

4 

OM 

5. SO 

17.3 

38.1 

85.3 

*** 

*** 

*** 

*** 

**;•: 

*** 

5 

12A 

1.87 

1.27 

1.44 

2.48 

1.26 

1.30 

1.68 

2.76 

1.29 

1.58 

6 

17A 

1.S9 

1.18 

1.37 

2.82 

1.17 

1.22 

1.72 

3.21 

1.21 

1.51 

7 

2 LA 

1.96 

1.16 

1.36 

3.11 

1.14 

1.21 

1.73 

3.58 

1.19 

1.50 

8 

25A 

2.07 

1.15 

1.39 

3.60 

1.11 

1.18 

1.89 

3.93 

1.18 

1.53 

.ADA 

1.98 

1.19 

1.42 

2.67 

1.15 

1.24 

1.74 

3.01 

1.23 

1.57 

10 

HG1 

2.14 

1.28 

1.74 

4.26 

1.16 

1.30 

1.88 

3.30 

1.31 

1.69 

11 

HG2 

1.93 

1.29 

1.66 

2.80 

1.20 

1.30 

1.84 

2.83 

1.30 

1.66 

12 

1.81*A 

1.97 

1.19 

1.42 

2.59 

1.15 

1.24 

1.69 

2.84 

1.22 

1.54 

13 

1.90A 

2.01 

1.17 

1.38 

2.86 

1.12 

1.21 

1.75 

3.30 

1.20 

1.52 

14 

1.95A 

2.01 

1.16 

1.38 

3.01 

1.12 

1.20 

1.74 

3.31 

1.20 

1.52 

15 

2.00A 

1.94 

1.18 

1.42 

2.77 

1.15 

1.23 

1.74 

2.98 

1.21 

1.49 

16 

H/L 

1.97 

1.22 

1.65 

4.04 

1.14 

1.24 

2.01 

4.57 

1.25 

1.73 

17 

RN 

2.24 

1.28 

1.95 

5.93 

1.12 

1.28 

2.46 

6.76 

1.31 

1.93 

18 

JOH 

2.19 

1.24 

1.52 

4.38 

1.16 

1.27 

1.73 

3.25 

1.28 

1.62 

19 

5T4 

1.98 

1.24 

1.42 

3.73 

1.20 

1.25 

1.66 

3.00 

1.26 

1.54 

preliminary  calculations  indicate  that  Hogg-type  adaptors  perform 
similarly  under  the  time  series  alternatives. 

We  withhold  discussion  of  these  results  to  section  5. 

5.  DISCUSSION  AND  CONCLUSION 

Table  III  contains  many  estimators  with  fairly  comparable 
variance.  In  order  to  make  the  better  estimators  more  apparent, 
we  have  constructed  an  additional  table,  Table  V.  Let  , 
i = 1,...,1000  represent  the  1000  observations  of  an  estimator  of 
location,  and  let 

7 1 1000  _ 7 

Viwi=1  (TrT) 

be  the  sample  variance.  Assuming  the  T^’s  are  approximately 


Z3 


lOOOs^  2 

normal,  then  ^ — is  distributed  approximately  as  a x with 

°T 

1 2 
999  degrees  of  freedom.  Accordingly,  the  variance  of  sT  is 

4 2 4 1 

.001998Oj,  and  the  variance  of  20s^.  is  . 7992o,j,  as  pointed  out 

earlier.  An  estimate  of  the  standard  deviation  of  s3,  can  be 

2 2 1 

found  by  calculating  V. 001998  s^  = .0447s^.  . Table  V was  con- 
structed by  calculating  the  estimated  standard  deviation  for  the 
estimator  with  the  smallest  variance. 

If  an  estimator  had  a variance  that  fell  within  one  standard 
deviation  of  the  minimal  variance,  then  it  was  replaced  by  a "0" 
in  Table  V If  the  variance  was  more  than  one  standard  deviation 
but  less  than  two,  it  was  replaced  with  a "1"  in  Table  V.  Simi- 
larly for  "2”,  "3"  and  "4”.  Finally  if  the  variance  was  more 
than  5 standard  deviations  away  from  the  minimal  variance,  it  was 
simple  replaced  by  "5".  Thus  Table  V leaves  a clear  picture  of 
the  stronger  estimators. 

In  terms  of  the  4 light- tailed  alternatives  the  estimator  of 
choice  appears  to  be  the  outer  mean.  From  Table  V we  observe  that 
HG1,  RN  and  JOH  also  perform  creditably  but  significantly  more 
poorly  than  CM. 

We  believe  that  the  uniform  is  a highly  artificial  situation. 
The  contamination  of  a normal  by  a distribution  whose  support  is  a 
bounded  interval  seems  less  so.  While  this  very  small  selection 
of  short-tailed  alternatives  is  inadequate  for  sweeping  committ- 
ments to  certain  types  of  estimators,  it  is  very  suggestive  of  what 
is  reasonable.  It  is  clear  that  most  estimators  do  very  poorly  in 
short-tailed  situations.  Some  authors  suggest  that  whenever  short- 
tailed alternatives  arise,  we  will  be  able  to  recognize  them  and 
having  recognized  them,  use  some  appropriate  estimator  such  as  CM. 
We  believe  that  this  is  really  a very  crude  form  of  adaptive  pro- 
cedure based  on  intuition  or  some  other  form  of  non-statistical 
knowledge.  It  is  therefore  not  really  satisfactory,  and  we  believe 
some  more  formal  procedures  such  as  Hogg's  procedure  are  desirable. 


Just  as  dramatic  as  the  OM' s good  performance  in  the  short- 
tailed situations  is  its  bad  performance  in  every  long-tailed  situ- 
ation. Table  VI  also  distinguishes  several  classes  of  strong  per- 
formers in  heavy- tailed  situations.  As  a class,  the  hampels  appear 
strong  in  spite  of  the  appearance  of  5’s.  The  adaptive  hampels 
also  appear  very  strong  with  the  added  bonus  of  no  5’s  in  long- 
tailed situations. 

In  the  PBS,  the  concept  of  deficiency  is  introduced  in  order 
to  provide  a comparison  of  two  estimators.  The  efficiency  of  an 
estimator  under  test  relative  to  a standard  estimator  is  defined  by 

Cr-  ■ variance  of  standard 

efficiency  = = — 

1 variance 

and  the  deficiency  by 

deficiency  = 1 - efficiency. 

Notice  that  a negative  deficiency  means  the  estimator  is  more  effi- 
cient than  the  standard.  Thus  deficiency  is  centered  at  0 with 
negative  meaning  the  standard  is  less  efficient  and  positive  mean- 
ing more  efficient. 

We  feel  the  advantage  of  the  zero  reference  point  is  out- 
weighed by  the  following  anomaly.  In  a sense,  efficiencies  of  2 
and  h mean  the  same  thing  (interchanging  the  roles  of  the  stan- 
dard estimator  with  the  test  estimator) . The  corresponding  defi- 
ciencies of  Jj  and  -1  are  not  symmetrically  located  about  0 
and,  on  the  intuitive  level,  not  apparently  related. 

Rather  than  compute  deficiency,  we  have  computed  the  natural 
logarithm  of  the  efficiency  ratio  which  also  has  a 0 reference 
(for  efficiency  1 ) and  is  symmetrical  in  the  sense  that  in2  = 
-SLnh.  The  logarithm  of  the  efficiency  also  has  the  advantage  that 
in  order  to  shift  standards,  only  a simple  subtraction  is  neces- 
sary. Table  VI  gives  logarithms  of  efficiencies  for  a variety  of 
the  better  performing  estimators  relative  to  1.95A.  To  illustrate 
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the  change  of  standard,  the  log  efficiency  of  25A  relative  to  1.95A 
under  Cauchy  alternative  is  -.172  while  the  log  efficiency  of 
1.81*A  relative  to  1.95A  is  .153.  Thus  the  log  efficiency  of  25A 
relative  to  1.81*A  is  just  -.172  - .153  = -.325.  Note  that  a 
negative  log  efficiency  means  the  standard  is  less  efficient  than 
the  test  estimator. 

In  Test  VI,  1.95A  was  used  as  a standard  although  it  was  not 
uniformly  the  best.  However,  we  feel  it  adequately  represents  the 
class  of  adaptive  hampels.  For  the  moment,  let  us  consider  Table 


VI  based  on  variances. 


TABLE  VI 


Logarithm  of  the  Relative  Efficiency  at  1.95A. 


17A 

21A 

25A 

ADA 

1.81*A 

1.90A 

2.00A 

H/L 

5T4 

N(0,1) 

-.029 

.019 

.000 

-.029 

-.019 

.000 

.010 

.000 

-.047 

N+. 01N(0,9) 

-.069 

- . 030 

.000 

.040 

-.030 

-.010 

-.040 

-.030 

-.069 

N+.025\(0,9) 

-.034 

-.017 

.009 

-.026 

.034 

-.009 

-.017 

.000 

-.051 

N+.05N'(0,9) 

-.024 

.000 

.008 

-.031 

-.016 

-.008 

.000 

-.016 

-.062 

N+.10N’(0,9) 

.000 

.007 

.007 

-.028 

-.014 

-.007 

.000 

.007 

-.056 

N+. 1 5N (0,9) 

.015 

.020 

.007 

-.020 

.013 

.007 

.013 

.007 

-.032 

N+. 25N(0,9) 

.062 

.025 

-.029 

.015 

.020 

.000 

.035 

.020 

.015 

N+.05N(0,100) 

-.017 

.000 

.009 

-.026 

-.026 

-.008 

-.017 

.050 

-.067 

N+.10\T(0,10Q) 

.00" 

.015 

-.007 

-.029 

-.029 

.000 

-.029 

-.179 

-.029 

N+.25N (0,100) 

.065 

-.033 

-.179 

.120 

.150 

.051 

.083 

-.294 

-.214 

N+.05C 

-.044 

-.018 

.009 

-.026 

-.026 

.000 

-.026 

-.018 

-.069 

N+.10C 

-.015 

-.008 

.017 

-.033 

-.033 

-.008 

-.025 

-.033 

-.041 

N+.50C 

.012 

.006 

-.083 

.000 

.029 

-.006 

.000 

-.144 

.047 

C 

.031 

-.078 

-.172 

.095 

.153 

.003 

.105 

-.322 

.098 

N+.10N/U 

-.008 

.008 

.017 

-.025 

-.017 

.000 

-.008 

-.041 

-.049 

N+. 25N/U 

.007 

.013 

-.007 

-.032 

-.013 

.000 

.020 

-.129 

-.013 

When  compared  to  ADA,  H/L  and  5T4,  1.95A  dominates  in  the 
sense  that  there  are  more  negatives  than  positives.  In  these  cases, 
the  magnitude  of  the  negative  log  efficiency  is  generally  much 
greater  than  that  of  positive  log  efficiency.  For  example,  the 
largest  negative  log  efficiency  for  5T4  is  -.214  while  the  lar- 
gest positive  log  efficiency  is  only  .098.  The  only  clear  excep- 
tion to  this  rule  in  these  cases  is  the  positive  .120  for  ADA. 

We  can  point  out  that  relative  to  1.81*A,  ADA  does  not  exhibit  this 
behavior. 

When  compared  to  the  hampels,  the  adaptive  hampels  do  not 
usually  dominate  in  terms  of  sign.  That  the  adaptive  hampels  do 
not  uniformly  dominate  is  an  entirely  reasonable  outcome.  The 
adaptive  estimators  are,  after  all,  dynamic  averages  of  the  non- 
adaptive  versions.  Therefore,  we  could  not  reasonably  expect  to 
achieve  the  very  best  behavior  of  the  non-adaptives,  but  wc  should 
be  able  to  get  quite  close.  Moreover  adaption  allows  us  to  avoid 


the  worst  pitfalls.  Thus,  the  worst  case  behavior  still  applies. 
For  example,  25A  has  a positive  log  efficiency  in  10  of  the  16 
cases.  However,  the  maximum  positive  log  efficiency  is  only  .017 
whereas  the  smallest  negative  log  efficiency  is  -.179.  For  22A 
the  maximum  positive  log  efficiency  is  .015  while  the  worst  nega- 
tive case  is  -.051,  and  so  on. 

To  summarize,  with  adaptive  hampels,  one  may  lose  a slight  bit 
of  efficiency  in  some  cases,  but  gain  a rather  large  amount  in 
others.  In  this  sense  if  hampels  are  good,  adaptive  hampels  are 
better. 

There  is  also  some  question  in  our  minds  whether  Q-^  and  Q2 
are  the  most  suitable  measures  of  heavy- tailedness.  In  particular, 
as  Hogg  has  suggested  in  a private  communication,  although  a sample 
may  be  drawn  from  a symmetric  population  the  sample  may  have  sig- 
nificant asymmetries.  Thus  an  adaptive  procedure  accounting  for 
such  asymmetries  should  be  able  to  improve  procedures  such  as  HG1 
and  HG 2 as  well  as  the  adaptive  hampels.  Also  we  feel  work  needs 
to  be  done  in  adapting  to  light  tails.  A more  sensitive  measure 
of  light-tailedness  may  improve  the  efficiencies  there  also.  In 
regard  to  adaptive  estimation,  we  must  take  exception  to  the  PRS. 
Properly  formulated,  blatantly  adaptive  estimators  perform  at  least 
as  well  as  non-adaptors  for  sample  sizes  less  than  40. 

Finally  we  mayr  address  ourselves  to  Table  IV  and  the  time 
series  alternative.  A word  on  the  design  of  the  sampling  situa- 
tions in  in  order.  For  the  first  order  autoregressive  model 

xt  - ^^t-1  + ^"t  ’ t - 1,2, ...  ,n 

a negative  value  of  p guarantees  a spectrum  dominated  by  high 
frequencies.  In  this  circumstance  observations  will  occur  on  both 
"sides"  of  the  center.  For  positive  values  of  p low  frequencies 
dominate.  Hence  for  positive  p,  particularly  for  p close  to  1 
the  time  series  could  make  long  excursions  on  one  "side"  of  the 
center.  Thus  the  difficulty  in  location  estimation  in  a time 


V 


series  is  close  to  nonstationarity.  Gastwirth  and  Rubin  (1975) 
discuss  robust  estimation  in  precisely  this  situation.  In  particu- 
lar, they  derive  efficiencies  of  several  estimators  relative  to  the 
mean  M.  Unfortunately, as  Table  IV  clearly  demonstrates, M is  on  the 
whole  unsatisfactory.  It  is  our  assessment  that  no  estimator  pre- 
sently fashionable  is  very  satisfactory'  in  the  presence  of  posi- 
tively correlated  data. 

A more  complicated  time  series  situation  might  have  been  to 
examine  a time  series  which  is  contaminated  by  (possibly)  uncorre- 
lated observations.  The  general  performance  as  seen  in  Table  IV 
did  not  seem  to  warrant  this  investigation,  however. 
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